A systematic comparison and evaluation of biclustering methods for gene expression data

نویسندگان

  • Amela Prelic
  • Stefan Bleuler
  • Philip Zimmermann
  • Anja Wille
  • Peter Bühlmann
  • Wilhelm Gruissem
  • Lars Hennig
  • Lothar Thiele
  • Eckart Zitzler
چکیده

MOTIVATION In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. RESULTS First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Biclustering Methods: A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data

Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness h...

متن کامل

به کارگیری خوشه‌بندی دوبعدی با روش «زیرماتریس‌های با میانگین- درایه‌های بزرگ» در داده‌های بیان ژنی حاصل از ریزآرایه‌های DNA

Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...

متن کامل

BiFree: An Efficient Biclustering Technique for Gene Expression Data Using Two Layer Free Weighted Bipartite Graph Crossing Minimization

Conventional clustering technique for gene expression data provides a global view of the data. In the biological prospective, a local view is essential for better analysis of gene expression data with simultaneous grouping of genes and conditions. Several biclustering techniques have been proposed in the literature based on different problem formulation. Therefore, it is difficult to compare th...

متن کامل

Improved biclustering of microarray data demonstrated through systematic performance tests

A new algorithm is presented for 4tting the plaid model, a biclustering method developed for clustering gene expression data. The approach is based on speedy individual di6erences clustering and uses binary least squares to update the cluster membership parameters, making use of the binary constraints on these parameters and simplifying the other parameter updates. The performance of both algor...

متن کامل

BiRange:An Efficient Framework for Biclustering of Gene Expression Data Using Range Bipartite Graph

Biclustering is a vital data mining tool which is commonly employed on microarray data sets for analysis task in bioinformat ics research and medical applications. There has been extensive research on biclustering of gene expression data arising from microarray experiment. This technique is an important analysis tool in gene expression measurement, when some genes have multip le functions and e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 22 9  شماره 

صفحات  -

تاریخ انتشار 2006